Learning Feature Taxonomies for Case Indexing

نویسندگان

  • Kalyan Moy Gupta
  • David W. Aha
  • Philip G. Moore
چکیده

Taxonomic case retrieval systems significantly outperform standard conversational case retrieval systems. However, their feature taxonomies, which are the principal reason for their superior performance, must be manually developed. This is a laborious and error prone process. In an earlier paper, we proposed a framework for automatically acquiring features and organizing them into taxonomies to reduce the taxonomy acquisition effort. In this paper, we focus on the second part of this framework: automated feature organization. We introduce TAXIND, an algorithm for inducing taxonomies from a given set of features; it implements a step in our FACIT framework for knowledge extraction. TAXIND builds taxonomies using a novel bottom up procedure that operates on a matrix of asymmetric similarity values. We introduce measures for evaluating taxonomy induction performance and use them to evaluate TAXIND’s learning performance on two case bases. We investigate both a knowledge poor and a knowledge rich variant of TAXIND. While both outperform a baseline approach that does not induce taxonomies, there is no significant performance difference between the TAXIND variants. Finally, we discuss how a more comprehensive representation for features should improve measures on TAXIND’s learning and performance tasks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Acquiring Case Indexing Taxonomies From Text

Taxonomic case-based reasoning is a conversational casebased reasoning methodology that employs feature subsumption taxonomies for incremental case retrieval. Although this approach has several benefits over standard retrieval approaches, methods for automatically acquiring these taxonomies from text documents do not exist, which limits its widespread implementation. To accelerate and simplify ...

متن کامل

Large-Scale Many-Class Prediction via Flat Techniques

Prediction problems with huge numbers of classes are becoming more common. While class taxonomies are available in certain cases, we have observed that simple flat learning and classification, via index learning and related techniques, offers significant efficiency and accuracy advantages. In the PASCAL challenge on large-scale hierarchical text classification, the accuracies we obtained ranked...

متن کامل

Exploiting Term, Predicate, and Feature Taxonomies in Propositionalization and Propositional Rule Learning

Knowledge representations using semantic web technologies often provide information which translates to explicit term and predicate taxonomies in relational learning. We show how to speed up the propositionalization by orders of magnitude, by exploiting such taxonomies through a novel refinement operator used in the construction of conjunctive relational features. Moreover, we accelerate the su...

متن کامل

Learning to Refine Indexing by Introspective Reasoning

A significant problem for case-based reasoning (CBR) systems is determining the features to use in judging case similarity for retrieval. We describe research that addresses the feature selection problem by using introspective reasoning to learn new features for indexing. Our method augments the CBR system with an introspective reasoning component which monitors system performance to detect poo...

متن کامل

Automatic MeSH term assignment and quality assessment

For computational purposes documents or other objects are most often represented by a collection of individual attributes that may be strings or numbers. Such attributes are often called features and success in solving a given problem can depend critically on the nature of the features selected to represent documents. Feature selection has received considerable attention in the machine learning...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004